The Effect of Missing Data on Repeated Measures Models
نویسنده
چکیده
Researchers involved with longitudinal studies are faced with the problem of trying to get study subjects to return for every follow-up visit. There is always some amount of missing data when looking at these types of studies. The MIXED procedure of the SAS enables examination of correlational structures and variability changes between repeated measurements on experimental units across time. While PROC MIXED has the capacity to handle unbalanced data when the data are missing at random, a question arises as to when the degree of sparseness jeopardizes inference. Simulation is a tool that can be used to answer these types of questions. This paper shows the application of simulation to determine inference problems in a data set with a specific pattern of missing data. This technique is also applied to the topic of initial sample size determination. INTRODUCTION Researchers at the Medical College of Georgia have been collecting data on and studying children from families with a history of hypertension for a number of years. A measurement of interest is the systolic blood pressure (SBP) measurement obtained from a monitor that the child wears for 24 hours. SBP measurements are obtained every 20 minutes from 6am to 10pm and every 30 minutes during the night. Daytime and nighttime means are calculated and used in analysis. Because of the nature of these measurements not all children in the study consent to wear an ambulatory BP monitor and those that consent to wear the monitor do not do so every year of the study. When they do wear the monitor, there may be technical problems which result in an insufficient number of readings for analysis. This resulted in a small and sparse data set when four consecutive years of data were looked at. Table 1 shows the frequency and percent of the 92 children who had at least two of four measurements and the years in which the measurements were obtained. The data set is only 57% complete. Table 1. Unbalanced data structure _______________________________________________ Y_1 Y_2 Y_3 Y_4 Frequency Percent 1 . . 4 14 15.2 1 . 3 . 24 26.1 1 . 3 4 4 4.3 1 2 . . 29 31.5 1 2 . 4 6 6.5 1 2 3 . 13 14.1 1 2 3 4 2 2.2 ________________________________________ When these data were analyzed using PROC MIXED the preferred error variance-covariance (V-C) structure was determined to be compound symmetry (CS). Toeplitz (TOEP) or autoregressive (AR(1)) would seem to more intuitive given the nature of a year separation between measurements. Correlations between measurements separated by more time are expected to be lower than those between measurements obtained closer together. However, since 67 of 92 (73%) of the individuals were represented by only 2 data points it is not surprising that a determination of CS was returned since that is all that could be estimated. PROC MIXED does not impute missing data but uses what data is present for estimation of the error structure of the random effects. We thought that an objective example for the researcher was needed to show the inherent problems associated with such incomplete data. That motivation along with the desire to determine the effect of more reasonable patterns of missing data on error structure estimation prompted us to undertake a simulation project. Simulation was used to investigate the possibility that the inferences obtained from the MIXED analyses might be due to the small sample size and/or sparseness of the data. It was also used to investigate the sample sizes needed to make correct determinations of the assumed underlying structure. SIMULATION The simulation problem was to generate samples of various sizes from a 4-variable multivariate normal distribution with specified mean vector and variance-covariance matrix. In a separate study of SBP in children it was determined that the mean±SD for SBP in each of 4 years was 110±10 mmHg. The correlation between measurements separated by 1 year was .70, 2 years was .60 and 3 years was .48, following a TOEP covariance structure. Thus, the samples to be generated are of the form
منابع مشابه
Investigating the missing data effect on credit scoring rule based models: The case of an Iranian bank
Credit risk management is a process in which banks estimate probability of default (PD) for each loan applicant. Data sets of previous loan applicants are built by gathering their data, and these internal data sets are usually completed using external credit bureau’s data and finally used for estimating PD in banks. There is also a continuous interest for bank to use rule based classifiers to b...
متن کاملDEA with Missing Data: An Interval Data Assignment Approach
In the classical data envelopment analysis (DEA) models, inputs and outputs are assumed as known variables, and these models cannot deal with unknown amounts of variables directly. In recent years, there are few researches on handling missing data. This paper suggests a new interval based approach to apply missing data, which is the modified version of Kousmanen (2009) approach. First, the prop...
متن کاملA Comparative Review of Selection Models in Longitudinal Continuous Response Data with Dropout
Missing values occur in studies of various disciplines such as social sciences, medicine, and economics. The missing mechanism in these studies should be investigated more carefully. In this article, some models, proposed in the literature on longitudinal data with dropout are reviewed and compared. In an applied example it is shown that the selection model of Hausman and Wise (1979, Econometri...
متن کاملPresenting a New Model for Bank’s Supply Chain Performance Evaluating with DEA Solution Approach
Data Envelopment Analysis (DEA) is a method for measuring the efficiency of peer decision making units (DMUs) with multiple inputs and outputs. The traditional DEA treats decision making units under evaluation as black boxes and calculates their efficiencies with first inputs and last outputs. This carries the notion of missing some intermediate measures in the process of changing the inputs to...
متن کاملThe Effect of Missing Data on Sample Sizes for Repeated Measures Models
Researchers involved with longitudinal studies are faced with the problem of trying to get study subjects to return for every follow-up visit. There is always some amount of missing data when looking at these types of studies. The MIXED procedure of the SAS enables examination of correlational structures and variability changes between repeated measurements on experimental units across time. Wh...
متن کاملInfluence of Pattern of Missing Data on Performance of Imputation Methods: An Example from National Data on Drug Injection in Prisons
Background Policy makers need models to be able to detect groups at high risk of HIV infection. Incomplete records and dirty data are frequently seen in national data sets. Presence of missing data challenges the practice of model development. Several studies suggested that performance of imputation methods is acceptable when missing rate is moderate. One of the issues which was of less concern...
متن کامل